NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows

Pan, Zaifeng; Patel, Ajjkumar; Shen, Yipeng; Hu, Zhengding; Guan, Yue; Li, Wan-Lu; Qin, Lianhui; Wang, Yida; Ding, Yufei (November 2026, The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025) i)

Full Text Available
WLB-LLM: workload-balanced 4D parallelism for large language model training

Wang, Zheng; Cai, Anna; Xie, Xinfeng; Pan, Zaifeng; Guan, Yue; Chu, Weiwei; Wang, Jie; Li, Shikai; Huang, Jianyu; Cai, Chris; et al (July 2026, Proceedings of the 19th USENIX Conference on Operating Systems Design and Implementation (OSDI))

Full Text Available
HedraRAG: Co-Optimizing Generation and Retrieval for Heterogeneous RAG Workflows

https://doi.org/10.1145/3731569.3764806

Hu, Zhengding; Murthy, Vibha; Pan, Zaifeng; Li, Wanlu; Fang, Xiaoyi; Ding, Yufei; Wang, Yuke (October 2025, The 31st Symposium on Operating Systems Principles (SOSP 2025))

Full Text Available
S-QGPU: Shared quantum gate processing unit for distributed quantum computing

https://doi.org/10.1116/5.0241972

Du, Shengwang; Ding, Yufei; Qiao, Chunming (March 2025, AVS Quantum Science)

We propose a distributed quantum computing (DQC) architecture in which individual small-sized quantum computers are connected to a shared quantum gate processing unit (S-QGPU). The S-QGPU comprises a collection of hybrid two-qubit gate modules for remote gate operations. In contrast to conventional DQC systems, where each quantum computer is equipped with dedicated communication qubits, S-QGPU effectively pools the resources (e.g., the communication qubits) together for remote gate operations, and, thus, significantly reduces the cost of not only the local quantum computers but also the overall distributed system. Our preliminary analysis and simulation show that S-QGPU's shared resources for remote gate operations enable efficient resource utilization. When not all computing qubits (also called data qubits) in the system require simultaneous remote gate operations, S-QGPU-based DQC architecture demands fewer communication qubits, further decreasing the overall cost. Alternatively, with the same number of communication qubits, it can support a larger number of simultaneous remote gate operations more efficiently, especially when these operations occur in a burst mode.
more » « less
Full Text Available
SwitchQNet: Optimizing Distributed Quantum Computing for Quantum Data Centers with Switch Networks

https://doi.org/10.1145/3695053.3731046

Zhang, Hezi; Xu, Yiran; Hu, Haotian; Yin, Keyi; Shapourian, Hassan; Zhao, Jiapeng; Kompella, Ramana Rao; Nejabati, Reza; Ding, Yufei (June 2025, ACM)

Full Text Available
SwitchQNet: Optimizing Distributed Quantum Computing for Quantum Data Centers with Switch Networks

Zhang, Hezi; Xu, Yiran; Hu, Haotian; Yin, Keyi; Shapourian, Hassan; Zhao, Jiapeng; Kompella, Ramana_Rao; Nejabati, Reza; Ding, Yufei (June 2025, The 49th lEEE/ACM International Symposium on Computer Architecture (ISCA))

Full Text Available
CaliQEC: In-situ Qubit Calibration for Surface Code Quantum Error Correction

https://doi.org/10.1145/3695053.3731042

Fang, Xiang; Yin, Keyi; Zhu, Yuchen; Ruan, Jixuan; Tullsen, Dean; Liang, Zhiding; Sornborger, Andrew; Li, Ang; Humble, Travis; Ding, Yufei; et al (June 2025, ACM)

Full Text Available
Mercury: Unlocking Multi-GPU Operator Optimization for LLMs via Remote Memory Scheduling

https://doi.org/10.1145/3731569.3764798

Guan, Yue; Qiang, Xinwei; Pan, Zaifeng; Johnson, Daniels; Fang, Yuanwei; Zhou, Keren; Wang, Yuke; Li, Wanlu; Ding, Yufei; Aziz, Adnan (October 2025, The 31st Symposium on Operating Systems Principles (SOSP 2025))

Full Text Available
GMI-DRL: Empowering Multi-GPU DRL with Adaptive-Grained Parallelism

Wang, Zheng; Cai, Anna; Xie, Xinfeng; Pan, Zaifeng; Guan, Yue; Chu, Weiwei; Wang, Jie; Li, Shikai; Huang, Jianyu; Cai, Chris; et al (July 2025, 2025 USENIX Annual Technical Conference.)

Full Text Available
CaliQEC: In-situ Qubit Calibration for Surface Code Quantum Error Correction

Fang, Xiang; Yin, Keyi; Zhu, Yuchen; Ruan, Jixuan; Tullsen, Dean; Liang, Zhiding; Sornborger, Andrew; Li, Ang; HumbleTravis; Ding, Yufei; et al (June 2025, The 49th lEEE/ACM International Symposium on Computer Architecture (ISCA))

Full Text Available

« Prev Next »

Search for: All records